Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

نویسندگان

  • Matti Karppa
  • Petteri Kaski
  • Jukka Kohonen
  • Padraig Ó Catháin
چکیده

We derandomize G. Valiant’s [J. ACM 62 (2015) Art. 13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant’s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzagproduct expanders in Reingold, Vadhan, and Wigderson [Ann. of Math. 155 (2002) 157–187]. We say that a function f : {−1, 1} → {−1, 1} is a correlation amplifier with threshold 0 ≤ τ ≤ 1, error γ ≥ 1, and strength p an even positive integer if for all pairs of vectors x, y ∈ {−1, 1} it holds that (i) |〈x, y〉| < τd implies |〈f(x), f(y)〉| ≤ (τγ)D; and (ii) |〈x, y〉| ≥ τd implies ( 〈x,y〉 γd )p D ≤ 〈f(x), f(y)〉 ≤ ( γ〈x,y〉 d )p D. 1998 ACM Subject Classification F.2.1 Numerical Algorithms and Problems

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Faster Subquadratic Algorithm for Finding Outlier Correlations

We study the problem of detecting outlier pairs of strongly correlated variables among a collection of n variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of n vectors with unit Euclidean norm and dimension d, and we are asked to find all the outlier pairs of vectors whose inner product is at least...

متن کامل

A statistical test for outlier identification in data envelopment analysis

In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity

Point location problems for n points in d-dimensional Euclidean space (and `p spaces more generally) have typically had two kinds of running-time solutions: (Nearly-Linear) less than d · n log n time, or (Barely-Subquadratic) f(d) ·n2−1/Θ(d) time, for various f . For small d and large n, “nearly-linear” running times are generally feasible, while the “barely-subquadratic” times are generally in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016